[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09 #2625

pragupta · 2025-09-09T19:59:50Z

rocm_base: 681e60e

Tested this PR on MI300x using registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16623_ubuntu24.04_py3.12_pytorch_rocm7.1_internal_testing_681e60e1

Ran the following UTs:
test_nn, test_torch, test_cuda, test_ops, test_unary_ufuncs, test_autograd, inductor/test_torchinductor

All ran fine, attaching logs!
default_ut.log

Successful wheel build job with this branch: http://rocm-ci.amd.com/view/preview/job/pytorch2.8-manylinux-wheels-preview/116/

…lt (pytorch#159889)" This reverts commit 4ae57d4. Reverted pytorch#159889 on behalf of https://github.com/jeanschmidt due to Failing internal tests, probably typechecks. See D81588399 ([comment](pytorch#159889 (comment)))

On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`. I assume some operator fusing is being used by GCC. Small differences like using `vmovdqa` can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb The greatest differences are consistent and the same on both CPU architectures: ``` Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed) Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed) ``` Hence I assume this is in the expected tolerances especially as `complex128` and all other types pass. Pull Request resolved: pytorch#152424 Approved by: https://github.com/malfet

@ezyang

This reverts commit 90b0864. Reverted pytorch#160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](pytorch#160449 (comment)))

Many users want a config to force all cuda ops captured by cudagraph. When not possible, pt2 should error. This PR adds `torch._inductor.triton.cudagraph_or_error` for that (default as False). Also added an environment variable `TORCHINDUCTOR_CUDAGRAPH_OR_ERROR` to control. Pull Request resolved: pytorch#161862 Approved by: https://github.com/ezyang, https://github.com/mlazos

…ytorch#162044)" This reverts commit cd529b6. Reverted pytorch#162044 on behalf of https://github.com/jeffdaily due to mi200 backlog is purged, and mi300 runners are failing in GHA download ([comment](pytorch#162044 (comment)))

# Motivation https://github.com/pytorch/pytorch/pull/143553/files#diff-6492991193449e118ff0c8d42ca544cc38a73604e505ff246a3c711aeab91748R1345 makes `largeTensorTest` malfunction on XPU. This PR aims to fix it. Pull Request resolved: pytorch#161988 Approved by: https://github.com/EikanWang, https://github.com/albanD

…h#161907) `CMAKE_PREFIX_PATH` is a list of paths used to find dependencies. The test overwrites that with a single path causing dependencies such as protobuf or Abseil not being found. Instead prepend the path to the existing value. This fixes a test failure: > pytorch-v2.7.1/test/inductor/test_aot_inductor_package.py", line 242, in test_compile_after_package > self.assertTrue(so_path.exists()) > AssertionError: False is not true Caused by: ``` /software/binutils/2.42-GCCcore-13.3.0/bin/ld: cannot find -labsl::utility: No such file or directory /software/binutils/2.42-GCCcore-13.3.0/bin/ld: cannot find -labsl::variant: No such file or directory collect2: error: ld returned 1 exit status ``` Pull Request resolved: pytorch#161907 Approved by: https://github.com/Skylion007

I found a number of places that seem to want forwarding references but the type signature does not reflect that Pull Request resolved: pytorch#161094 Approved by: https://github.com/malfet

Signed-off-by: Edward Yang <[email protected]> Pull Request resolved: pytorch#162164 Approved by: https://github.com/bdhirsh, https://github.com/albanD, https://github.com/wconstab

Fixes pytorch#161868 Pull Request resolved: pytorch#162106 Approved by: https://github.com/jansel, https://github.com/zou3519

…ch#158747) This is a part of our effort for integrating Composable Kernel library for Inductor backend. Currently we have a submodule, but would prefer to have commit pin control over the library as with Triton. We intentionally avoid putting all installation logic in CI scripts to allow locally built versions to have this functionality. The idea is to have CK as a pytorch dependency in pytorch 2.9 release to allow people to use it with inductor and AOT inductor and then gradually step away from submodule usage. Right now CK usage in SDPA/Gemm is tied to submodule files. This PR is a remake of due to branch error: pytorch#156192 Pull Request resolved: pytorch#158747 Approved by: https://github.com/jeffdaily Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jack Taylor <[email protected]> Co-authored-by: Max Podkorytov <[email protected]> Co-authored-by: Copilot <[email protected]>

[PEP 735](https://peps.python.org/pep-0735) introduces the [dependency-groups] table for a number of use-cases one of which includes specifying development dependencies for projects. Pull Request resolved: pytorch#161216 Approved by: https://github.com/seemethere

Update the torch-xpu-ops commit to [intel/torch-xpu-ops@83c5a5](intel/torch-xpu-ops@83c5a5a), includes: - Revert "Disable xccl timer avoid drlm hang" because XPU time event issue has been fixed - Fallback lu_factor kernel to CPU for single batch - Enable aten::linalg_inv and aten::linalg_inv_ex on XPU Pull Request resolved: pytorch#162062 Approved by: https://github.com/EikanWang

) This PR implements the semantics change to `torch._dynamo.error_on_graph_break`: - ~`torch.compile` now has a new `error_on_graph_break` kwarg that serves as a lower-priority toggle for erroring/continuing on graph breaks~ - `error_on_graph_break` is a new internal `torch.compile `setting that is lower-priority than `fullgraph`. It allows the user to toggle erroring/continuing on graph breaks. - `error_on_graph_break` does nothing when `fullgraph=True` - `error_on_graph_break` does NOT guarantee a single graph Followup [DONE]: need to change the programming model docs to reflect the 3 graph break modes for compilation: - `fullgraph=True`: enforce one graph, no graph breaks, cannot be toggled - `fullgraph=False, error_on_graph_break=True`: errors on graph breaks, latter can be toggled during compile time - `fullgraph=False, error_on_graph_break=False`: resumes tracing on graph breaks, latter can be toggled during compile time Pull Request resolved: pytorch#161747 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#161739

@galv

…he CUDACachingAllocator (pytorch#158352) ## Introduction During CUDA Graph capture, the CUDA caching allocator currently defers reclaiming blocks until capture ends. This is because CUDA forbids querying events recorded during capture (the CUDA operation is not executed during the capture stage), so the allocator cannot use its normal event-based logic. However, capture records an DAG (we call it **capturing graph**) of work. We can use the capturing graph to determine when a block’s old lifetime is fully before future work, and safely reuse it within the same capture. This PR adds an experimental flag `graph_capture_record_stream_reuse: True|False (default: False)`. When enabled, the allocator inserts lightweight free markers and uses capture ordering to decide if a freed block is safe to reuse during capture. If the proof cannot be established, we fall back to the existing post-capture path. ## Terms * **Free marker**: A capture-legal no-op (created with `cudaGraphAddEmptyNode`) inserted after the last captured use of the block on each stream that used it. * **Terminal**: The set of the lastest operations of the stream (or the capturing graph). Any newly captured op on that stream will attach after all nodes in this set. For a stream currently capturing, it is the set of nodes returned in `dependencies_out` by `cudaStreamGetCaptureInfo`. ## When can we reuse a block during capture? ### Strong Rule (Graph-Wide Safety) This rule provides a universal guarantee that a block is safe for reuse by any stream in the graph. > A block is safe to reuse if every free marker is a predecessor of every terminal of all active streams in the graph. Why it's safe: This rule establishes a strict global ordering. Since any new operation on any stream must be appended after that stream's terminals, this condition guarantees that the block's new lifetime begins only after its old lifetime has completely ended everywhere. This prevents lifetime overlaps when the graph is replayed, ensuring correctness. ### Per-stream Rule (A Practical Optimization) The strong rule, while safe, is often unnecessarily restrictive. The `DeviceCachingAllocator` introduces a crucial constraint that allows for a simpler check. In `DeviceCachingAllocator`, `get_free_block` only returns blocks whose `block->stream == p.stream()`. In other words, we never reuse a block on a stream different from the allocation stream. This means we don't need to verify safety across the entire graph. We only need to confirm that the block is safe to reuse from the perspective of its own allocation stream. > Reuse a block for allocations on stream S if every free marker is a predecessor of every node in the terminal set of S. In short, a block is considered **reusable** on stream S as long as all marker marking it "free" are guaranteed to complete before any new work that might need it on stream S begins. ## Implementation * On `free(block)` during capture * For each stream in `block->stream_uses` and the allocation stream, insert a free marker (empty node) and make it that stream’s tail. * If we cannot place markers for all such streams (for example, a stream is not in capture), defer to the post-capture path. * Otherwise, store the marker handles and keep the block in the capture-private structures. * On `allocate(stream)` during capture (attempt per-stream reclaim) * Query the allocation stream S’s terminal via `cudaStreamGetCaptureInfo`. * For each deferred block, check whether it is allocated on this stream, and each of its free markers is a predecessor of the terminal. * If yes, hand the block to S for immediate reuse within the same capture. * If no, keep it deferred; it will be reconsidered as capture progresses and S’s terminal advances. * On capture end * Any still-deferred blocks follow the existing post-capture reclamation (event insertion/polling). External behavior remains unchanged if we cannot prove safety during capture. ## Examples (2 streams) <img width="641" height="801" alt="pytorch-remove-cudagraph-defer-reclaiming (6)" src="https://github.com/user-attachments/assets/41adc835-d448-483b-99ba-b4341cb7d2a2" /> * Case 0 — Unsafe The two frees are not ordered with respect to each other. For stream 1, the other stream’s free marker does not precede this stream’s terminal, so the per-stream condition fails. Counterexample intuition for the unsafe setups: imagine `f2(x)` runs for a long time. If DeviceCachingAllocator reused block `x` on a stream whose terminal is not ordered after the free markers, the new lifetime could overlap the old one on replay, risking use-after-free or data corruption. The per-stream rule prevents exactly this. * Case 1 — Reusable on stream 1 Stream 1’s terminal is after both frees, so every free marker precedes stream 1’s terminal. The block is reusable for allocations on stream 1. * Case 2 — Not reusable on stream 2, but this cannot occur in `DeviceCachingAllocator` This depicts reusing the block on stream 2 while stream 1’s free is not yet ordered before stream 2’s terminal. Though the block is not safe to reuse on stream 2, DeviceCachingAllocator will not choose that block for stream 2 anyway: `get_free_block` rejects blocks whose `stream != p.stream()`. So this case is unreachable. * Case 3 — Safe (strong rule holds) In this scenario, the terminal nodes of all streams are positioned after the block's free markers, satisfying the strong rule. This guarantees the block is safe for reuse by any stream in the capturing graph. However, since `DeviceCachingAllocator ` only reuses a block on its original allocation stream, verifying this strong condition is unnecessary. We only need to ensure the per-stream rule is met for the specific stream requesting the block. * Case 4 — Freeing after a join See the note below. ## Edge Case: Freeing after a join Our current dependency tracking has a limitation in scenarios where a block is freed after a stream join, see @galv's [comments here](pytorch#158352 (review))). In the case 4, we have a missed opportunity. Because the block's usage is not explicitly marked, we cannot determine that the block's actual last use may have occurred much earlier, long before the join. Then, we must wait for the subsequent join before the block can be reused. ## Thanks Thanks to @galv for his great idea around graph parsing and empty nodes. Pull Request resolved: pytorch#158352 Approved by: https://github.com/ngimel, https://github.com/eqy Co-authored-by: Jeff Daily <[email protected]>

…orch#161984) Added a helper API to tell if the world is entirely within a P2P domain or crosses network. This is mainly for nblocks tuning purpose. (In later PRs) Pull Request resolved: pytorch#161984 Approved by: https://github.com/ngimel ghstack dependencies: pytorch#161983

so that the signal calls do not step on each other's foot. Pull Request resolved: pytorch#162026 Approved by: https://github.com/ngimel

…161407) Summary: Creates a fallback path for `torch._grouped_mm`, using the naive for loop implementation (or bmm). For the sake of keeping the PR small, this PR only enables SM80+ (CUDA capability 8.0 and up), since I am testing this on an A100 machine. In future PRs, we can increase the coverage of the fallback to: 1. float32 and float16, which will extend the GPU coverage 2. cpu Test Plan: ```bash pytest test/test_matmul_cuda.py -s -k test_grouped_gemm_2d_3d -x pytest test/test_matmul_cuda.py -s -k test_grouped_gemm_3d_2d -x pytest test/test_matmul_cuda.py -s -k test_grouped_gemm_2d_2d -x pytest test/test_matmul_cuda.py -s -k test_grouped_gemm_3d_3d -x ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#161407 Approved by: https://github.com/drisspg, https://github.com/eqy

…61717) Summary: Moves the `torch._grouped_mm` fallback from cuda-only code to a place where it can be used by multiple backends. Specifically: 1. make the fallback path and util functions reusable and move them to `ATen/native/GroupedMMUtils.h` 2. register a backend-agnostic kernel to composite explicit autograd key 3. refactor the grouped_mm tests to their own test case and enable CPU At the end of this PR, here is the support matrix: * CUDA SM90+: fast path with test coverage (no change) * CUDA SM80+: fallback with test coverage (no change) * CPU: fallback works, but without test coverage (new in this PR) * other SM versions and other backends: will probably already work, but let's leave this to future PRs * float32/float16: will probably already work, but let's leave this to future PRs Test Plan: ```bash pytest test/test_matmul_cuda.py -s -k test_grouped_gemm -x ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#161717 Approved by: https://github.com/ngimel, https://github.com/drisspg ghstack dependencies: pytorch#161407

…62059) Summary: Enables `torch.float32` and `torch.float16` options in `torch._grouped_mm`. Note that the fast path is only enabled if `mat_a`, `mat_b`, and `out_dtype` are `torch.bfloat16`. Saving for future PRs: 1. enabling testing on more platforms 2. supporting out_dtype != mat_a.dtype 3. opinfo 4. better compile support Test Plan: ```bash // on A100 and H100 pytest test/test_matmul_cuda.py -s -k test_grouped_gemm -x // on H100 pytest test/test_matmul_cuda.py -s -k test_scaled_grouped_gemm -x ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#162059 Approved by: https://github.com/ngimel, https://github.com/eqy ghstack dependencies: pytorch#161407, pytorch#161717

I dont have a failing test case but just saw an extra guard somewhere. Pull Request resolved: pytorch#162105 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi, https://github.com/jansel

…pytorch#161688) Fixes pytorch#161080 torch.export.export fails with TypeError: expand() got an unexpected keyword argument 'implicit' when calling torch.expand_copy(..., implicit=True). This happened because expand_copy = _make_copy_from_view(aten.expand) register aten. expand as the decomposition path for aten.expand_copy, which doesn’t accept the implicit argument. I have added an explicit a decomposition for aten.expand_copy in torch/_decomp/decompositions.py to ignore the implicit argument, and a simple unit test to demonstrate the bug being fixed. Pull Request resolved: pytorch#161688 Approved by: https://github.com/angelayi, https://github.com/can-gaa-hou

…ch#162073) for 2.9 🙏 Pull Request resolved: pytorch#162073 Approved by: https://github.com/drisspg

pytorch#161951) …h.is_complex. The PR proposes adding a simple, self-explanatory example to the documentation page. The example demonstrates the function's output for tensors with various data types, showing both True and False return values. Fixes pytorch#161859 Pull Request resolved: pytorch#161951 Approved by: https://github.com/zou3519

…orch#161355) Pull Request resolved: pytorch#161355 Approved by: https://github.com/zou3519

Update cpp-httplib with better error handling, bugfixes, and performance. Header only library update. Pull Request resolved: pytorch#162181 Approved by: https://github.com/jansel

Summary: att Test Plan: ci Rollback Plan: Reviewed By: minjang Differential Revision: D80828148 Pull Request resolved: pytorch#161798 Approved by: https://github.com/minjang, https://github.com/SherlockNoMad

Signed-off-by: Edward Yang <[email protected]> Pull Request resolved: pytorch#160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci

@malfet

This reverts commit 2c03f0a. Reverted pytorch#162007 on behalf of https://github.com/jeanschmidt due to Breaks internal builds see [D81588372](https://www.internalfb.com/diff/D81588372), @malfet may you help the author? ([comment](pytorch#162007 (comment)))

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

rocm-repo-management-api · 2025-09-09T20:37:03Z

Jenkins build for ab5575833f1eb9066df192dd91d9d7bd43385f65 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-09-09T21:34:06Z

Jenkins build for 60644390d3e6c3da6228427bb14c6b759011f97a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-09-09T23:33:12Z

Jenkins build for 304889c9da6276844081450426d1722846961f6c commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pruthvistony

Rubber-stamping the PR. Build is successful and conflicts have been resolved.

This reverts commit 69a25f6.

rocm-repo-management-api · 2025-09-10T18:51:56Z

Jenkins build for 9a66f82081053fff8105de82cd5f4593c393caf1 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pragupta · 2025-09-11T20:20:26Z

Do not merge yet, vllm team is testing with this branch and if perf looks good then we will merge it. Otherwise, wait till 9/15.

pruthvistony · 2025-09-12T04:26:34Z

.ci/docker/ci_commit_pins/triton.txt

@@ -1,5 +1 @@
-<<<<<<< HEAD
 56765e8c1f6490e21312b46242ed78cb2dd46d35


It will be updated to new branch

Yes, this will be handled separately

pruthvistony · 2025-09-12T04:46:34Z

aten/src/ATen/cuda/detail/OffsetCalculator.cuh

    if ((dims > 0) && (dims <= 2)) {
      auto divmod = sizes_[0].divmod(linear_idx);
-<<<<<<< HEAD
-      #pragma unroll


if we merge this line, then next IFU this will not show as conflict.

Sorry, I don't follow. I chose upstream change so next time we do IFU it won't show as merge conflict. If I choose HEAD, then we are picking local changes and may show as a merge conflict.

pruthvistony · 2025-09-12T04:48:04Z

aten/src/ATen/cuda/tunable/GemmHipblaslt.h

  return HIP_R_4F_E2M1;
 #else
-<<<<<<< HEAD
-  // Return HIP_R_4F_E2M1 enum value for earlier ROCm version.


Same comment as above.
In current scenario better to merge <<HEAD block, to avoid conflicts at next IFU.

pruthvistony · 2025-09-12T04:55:19Z

aten/src/ATen/native/Convolution.cpp

    case ConvBackend::Miopen:
    case ConvBackend::MiopenDepthwise:
    case ConvBackend::MiopenTranspose:
-<<<<<<< HEAD


Matching upstream, it is fine.

pruthvistony · 2025-09-12T04:55:26Z

aten/src/ATen/native/ConvUtils.h

  }

  // TODO: Remove PYTORCH_MIOPEN_SUGGEST_NHWC once ROCm officially supports NHWC in MIOpen
-<<<<<<< HEAD


Matching upstream, it is fine.

pruthvistony

On NHWC batchnorm, need Dmitry confirmation.

Other conflicts are resolved properly. AFAIK.

aten/src/ATen/native/miopen/Conv_miopen.cpp

jithunnair-amd

LGTM. Hope other IFUs get shorter, both in time and diffs :)

rocm-repo-management-api · 2025-09-17T01:57:20Z

Jenkins build for 9e7df766290def1ac0112fc758a6fa1ea126e95a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during base docker image building:


#30 [stage-0 22/56] COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
#30 DONE 0.1s

#31 [stage-0 23/56] RUN bash ./install_rocm_magma.sh
#31 0.262 ./install_rocm_magma.sh: line 91: syntax error: unexpected end of file
#31 ERROR: process "/bin/sh -c bash ./install_rocm_magma.sh" did not complete successfully: exit code: 2
------
 > [stage-0 23/56] RUN bash ./install_rocm_magma.sh:
0.262 ./install_rocm_magma.sh: line 91: syntax error: unexpected end of file
------

pytorchmergebot and others added 30 commits September 4, 2025 13:13

Revert "Always build USE_DISTRIBUTED. (pytorch#160449)"

b7dad7d

This reverts commit 90b0864. Reverted pytorch#160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](pytorch#160449 (comment)))

Fix usage of forwarding references (pytorch#161094)

1ebd70d

I found a number of places that seem to want forwarding references but the type signature does not reflect that Pull Request resolved: pytorch#161094 Approved by: https://github.com/malfet

Don't require FakeStore to be passed into fake backend (pytorch#162164)

248355f

Signed-off-by: Edward Yang <[email protected]> Pull Request resolved: pytorch#162164 Approved by: https://github.com/bdhirsh, https://github.com/albanD, https://github.com/wconstab

Add torch.compile support for triton.constexpr_function (pytorch#162106)

81aeefa

Fixes pytorch#161868 Pull Request resolved: pytorch#162106 Approved by: https://github.com/jansel, https://github.com/zou3519

[SymmMem] Increase signal pad size for NVL72 (pytorch#162026)

8bb213b

so that the signal calls do not step on each other's foot. Pull Request resolved: pytorch#162026 Approved by: https://github.com/ngimel

[dynamo] Make the MRO walk more narrow (pytorch#162105)

3302859

I dont have a failing test case but just saw an extra guard somewhere. Pull Request resolved: pytorch#162105 Approved by: https://github.com/williamwen42, https://github.com/StrongerXi, https://github.com/jansel

[cuDNN][SDPA] Enable cuDNN SDPA by default for SM 9.0, SM 10.0 (pytor…

6f7608d

…ch#162073) for 2.9 🙏 Pull Request resolved: pytorch#162073 Approved by: https://github.com/drisspg

[dynamo][hops] Remove const outputs from the speculated subgraph (pyt…

6b1900c

…orch#161355) Pull Request resolved: pytorch#161355 Approved by: https://github.com/zou3519

[BE]: Update cpp-httplib submodule to 0.26.0 (pytorch#162181)

1f51056

Update cpp-httplib with better error handling, bugfixes, and performance. Header only library update. Pull Request resolved: pytorch#162181 Approved by: https://github.com/jansel

[nativert] triton runtime implementation (pytorch#161798)

3dde5d7

Summary: att Test Plan: ci Rollback Plan: Reviewed By: minjang Differential Revision: D80828148 Pull Request resolved: pytorch#161798 Approved by: https://github.com/minjang, https://github.com/SherlockNoMad

Always build USE_DISTRIBUTED. (pytorch#160449)

c371032

Signed-off-by: Edward Yang <[email protected]> Pull Request resolved: pytorch#160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci

Revert "[BE] Cleanup stale comments/copy from gemm (pytorch#162001)"

afa6e56

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

pragupta requested review from jataylo, jithunnair-amd and pruthvistony as code owners September 9, 2025 19:59

pruthvistony approved these changes Sep 10, 2025

View reviewed changes

pragupta added 2 commits September 10, 2025 18:41

Fix merge conflicts

ff2ba4c

Revert "[ROCm] Enable USE_FBGEMM_GENAI (pytorch#160676)"

dba8539

This reverts commit 69a25f6.

pragupta force-pushed the rocm7.1_internal_testing_IFU_2025-09-09 branch from 304889c to dba8539 Compare September 10, 2025 18:44

Update related_commits

9a66f82

pragupta changed the title ~~[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09~~ [DO NOT MERGE] [AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09 Sep 11, 2025

pruthvistony reviewed Sep 12, 2025

View reviewed changes

aten/src/ATen/native/miopen/Conv_miopen.cpp Show resolved Hide resolved

pruthvistony requested a review from dnikolaev-amd September 12, 2025 05:10

pragupta changed the title ~~[DO NOT MERGE] [AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09~~ [AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09 Sep 16, 2025

jithunnair-amd approved these changes Sep 16, 2025

View reviewed changes

Kepp triton_version to 3.4.0 until we switch triton branch/commiti pin

9e7df76

pragupta merged commit 3a58f35 into rocm7.1_internal_testing Sep 17, 2025
44 of 46 checks passed

pragupta deleted the rocm7.1_internal_testing_IFU_2025-09-09 branch September 17, 2025 11:51

pragupta restored the rocm7.1_internal_testing_IFU_2025-09-09 branch September 24, 2025 20:07

pragupta mentioned this pull request Sep 24, 2025

rocm7.1_internal_testing_IFU_2025-09-09 #2677

Merged

pragupta deleted the rocm7.1_internal_testing_IFU_2025-09-09 branch September 24, 2025 21:27

		@@ -1,5 +1 @@
		<<<<<<< HEAD
		56765e8c1f6490e21312b46242ed78cb2dd46d35

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09 #2625

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-09-09 #2625

Uh oh!

Conversation

pragupta commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pruthvistony left a comment

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pragupta commented Sep 11, 2025

Uh oh!

pruthvistony Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pragupta Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pragupta Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jithunnair-amd left a comment

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

95 participants

pragupta commented Sep 9, 2025 •

edited

Loading

rocm-repo-management-api bot commented Sep 9, 2025 •

edited

Loading

rocm-repo-management-api bot commented Sep 9, 2025 •

edited

Loading

rocm-repo-management-api bot commented Sep 9, 2025 •

edited

Loading

rocm-repo-management-api bot commented Sep 10, 2025 •

edited

Loading

rocm-repo-management-api bot commented Sep 17, 2025 •

edited

Loading